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ABSTRACT 

Data warehouses are the core of decision support sys- 
tems, which nowadays are used by all kind of enter- 
prises in the entire world. Although many studies have 
been conducted on the need of decision support systems 
(DSSs) for small businesses, most of them adopt ex- 
isting solutions and approaches, which are appropriate 
for large-scaled enterprises, but are inadequate for small 
and middle-sized enterprises. 

Small enterprises require cheap, lightweight architec- 
tures and tools (hardware and software) providing on- 
line data analysis. In order to ensure these features, we 
review web-based business intelligence approaches. For 
real-time analysis, the traditional OLAP architecture is 
cumbersome and storage-costly; therefore, we also re- 
view in-memory processing. 

Consequently, this paper discusses the existing approa- 
ches and tools working in main memory and/or with 
web interfaces (including freeware tools), relevant for 
small and middle-sized enterprises in decision making. 

1. INTRODUCTION 

During the last decade, data warehouses (DWs) 
have become an essential component of modern de- 
cision support systems in most companies of the 
world. In order to be competitive, even small and 
middle-sized enterprises (SMEs) now collect large 

*This author also works at the Kharkiv National Uni- 
versity of Economics, 9-a, pr. Lenina, 61001 Kharkov, 
Ukraine 



volumes of information and are interested in busi- 
ness intelligence (BI) systems [26]. SMEs are re- 
garded as significantly important on a local, na- 
tional or even global basis and they play an impor- 
tant part in the any national economy [34^ . In spite 
of multiples advantages, existing DSSs frequently 
remain inaccessible or insufficient for SMEs because 
of the following factors: 

• high price; 

• high requirements for a hardware infrastruc- 
ture; 

• complexity for most users; 

• irrelevant functionality; 

• low flexibility to deal with a fast changing dy- 
namic business environment [56] : 

• low attention to difference in data access ne- 
cessity in SMEs and large-scaled enterprises. 

In addition, many projects fail due to the com- 
plexity of the development process. Moreover, as 
the work philosophies of small and large-scaled en- 
terprises are considerably different, it is not advis- 
able to use tools destined to large-scaled enterprises. 
In short, "one size does not fit all" 'ST. Further- 
more, there are a lot of problems in the identifica- 
tion of information needs of potential users in the 
process of building a data warehouse "7 . 
Thereby, SMEs require lightweight, cheap, flexible, 



simple and efficient solutions. To aim at these fea- 
tures, we can take advantage of light clients with 
web interfaces. For instance, web technologies are 
utilized for data warehousing by large corporations, 
but there is an even greater demand of such kind of 
systems among small and middle-sized enterprises. 
Usage of web technologies provides cheap software, 
because it eliminates the necessity for numerous 
dispersed applications, the necessity of deployment 
and maintenance of corporate network, and reduces 
training time. It is simple for end-users to utilize 
web-based solutions. In addition, a web-based ar- 
chitecture requires only lightweight software clients 
(i.e., web browsers). 

Besides, there is a need for real-time data analysis, 
which induces memory and storage issues. Tradi- 
tional OLAP (On line Analytical Processing) tools 
are often based on a cumbersome hardware and 
software architecture, so they require significant re- 
sources to provide a high performance. Their flex- 
ibility is limited by data aggregation. At the same 
time, in-memory databases provide significant per- 
formance improvements. Absence of disk I/O oper- 
ations permits fast query response times. In-memo- 
ry databases do not require indexes, recalculation 
and pre-aggregations, thus system becomes more 
flexible because analysis is possible to a detailed 
level without its pre-definition. Moreover, accord- 
ing to analyst firms, "by 2012, 70% of Global 1000 
organizations will load detailed data into memory 
as the primary method to optimize BI application 
performance" [47]. 

Thus, our objective is to propose an original and 
adapted BI solutions for SMEs. To this aim, in a 
first step, we review in this paper the existing re- 
search related to this issue. 

The remainder of this paper is organized as follows. 
In section 2, we first present and discuss web-based 
BI approaches, namely web data warehouses and 
web-based open source software for data warehous- 
ing. In Section 3, we review in-memory BI solutions 
(MOLAP, vector database-based BI software) and 
technologies that can support it (in-memory and 
vector databases). We finally conclude this paper 
in Section 4 and provide our view on how the re- 
search and technologies surveyed in this paper can 
be enhanced to fit SME's BI needs. 

2. WEB-POWERED BI 

The Web has become the platform of choice for 
the delivery of business applications for large-scaled 
entreprises as well as for SMEs. Web warehousing 
is a recent approach that merges data warehous- 
ing and business intelligence systems with web tech- 



nologies [S2] . In this section, we present and discuss 
web data warehousing approaches, their features, 
advantages and possibilities, as well as their neces- 
sity and potential for SMEs. 

2.1 Web warehousing 

2.1.1 General information 

There are two basic definitions of web warehous- 
ing. The first one simply states that web ware- 
houses use data from the Web. The second con- 
centrates on the use of web technologies in data 
warehousing. We focus on second definition in our 
paper. 

Web-data warehouses inherit a lot of characteristics 
from traditional data warehouses, including: data 
are organized around major subjects in the enter- 
prise; information is aggregated and validated; data 
is represented by times series, not by current status. 
Web-based data warehouses nonetheless differ from 
traditional DWs. Web warehouses organize and 
manage the stored items, but do not collect them 
[5^ . Web-based DW technology changes the pat- 
tern of users accessing to the DW: instead of ac- 
cessing through a LAN (Local Area Network) , users 
access via Internet/Intranet |30l . 
Specific issues raised by web-based DW include un- 
realistic user expectations, especially in terms of 
how much information they want to be able to ac- 
cess from the Web; security issues; technical imple- 
mentation problems related to peak demand and 
load problems [42] . 

Eventually, web technologies make data warehouses 
and decision support systems friendlier to users. 
They are often used in data warehouses only to vi- 
sualize information [18j . At the same time, web 
technology opens up multiple information formats, 
such as structured data, semi-structured data and 
unstructured data, to end-users. This gives a lot 
of possibilities to users, but also creates a problem 
known as data heterogeneity management [19j . 
Another important issue is the necessity to view the 
Web as an enormous source of business data, with- 
out whose enterprises loose a lot of possibilities. 
Owing to the Web, business analysts can access 
large external to enterprise information and then 
study competitor's movements by analyzing their 
web site content, can analyze customer preferences 
or emerging trends [H]. So, e-business technologies 
are expected to allow SMEs to gain capabilities that 
were once the preserve of their larger competitors 
[34j . However, most of the information in the Web 
is unstructured, heterogeneous and hence difficult 
to analyze [26] . 



Among web-technologies used in data warehousing, 
we can single out web browsers, web services and 
XML. Usage of web browser offers some advantages 
over traditional warehouse interface tools (THl [23] ■ 

• cheapness and simplicity of web browser in- 
stallation and use; 

• reduction of system training time; 

• elimination of problems posed by operating 

systems; 

• low cost of deployment and maintenance; 

• elimination of necessity for numerous dispersed 
applications; 

• possibility to open data warehouse to business 
partners over an extranet. 

Web warehouses can be divided into two classes: 
XML document warehouses and XML data ware- 
houses. We present them in sections 2.1.2. and 
2.1.3. respectively. We also introduce OLAP on 
XML data (XOLAP) in section 2.1.4. Wc finish 
this section by web-based paradigm known as cloud 
computing (section 2.1.5). Section 2.2. finally pre- 
sents web-based open source software for data ware- 
housing analysis. 

2.1.2 XML document warehouses 

An XML document warehouse is a software frame- 
work for analyzing, sharing and reusing unstruc- 
tured data (texts, multimedia documents, etc.). Un- 
structured data processing takes an important place 
in enterprise life because unstructured data are larger 
in volume than structured data, are more difficult 
to analyze, and are an enormous source of raw in- 
formation. 

Representing unstructured or semi-structured data 
with traditional data models is very difficult. For 
example, relational models such as star and snowflake 
schemas are semantically poor for unstructured data. 
Thus, Nassis et al. utilize object-oriented concepts 
to develop a conceptual model for XML document 
warehouses [3S]. They use UML diagrams to build 
hierarchical conceptual views. By combination of 
object oriented concepts and XML Schema, they 
build the xFACT repository. 

2.1.3 XML data warehouses 

In contrast to XML document warehouses, XML 
data warehouses focus on structured data. XML 
data warehouse design is possible from XML sources 
[3]. In this case, it is necessary to translate XML 
data into a relational schema by XML schema [3 , 8 . 
Xyleme is one of the first projects aimed at XML 



data warehouse design [ST] . It collects and archives 
web XML documents into a dynamic XML ware- 
house. 

Some more recent approaches are based on classical 
warehouse schemas. Pokorny adapts the traditional 
star schema with explicit dimension hierarchies for 
XML environments by using Document Type Def- 
inition (DTD) [41]. Boussai'd et al. define data 
warehouse schemas via XML schema in a methodol- 
ogy named X- Warehousing [8 . Golfarelli proposes 
a semi-automatic approach for building the concep- 
tual schema for a data mart starting directly from 
XML sources [15]. This work elaborates the con- 
cept of Dimensional Fact Model. Baril and Bellah- 
sene propose a View Model from XML Documents 
implemented in the DAWAX (Data Warehouse for 
XML) system 0] . View specification mechanism al- 
lows filtering data to be stored. N0rvag introduces 
a temporal XML data warehouses to query histor- 
ical document versions and query changes between 
document versions [36]. N0rvag et al. also propose 
TeXOR, a temporal XML database system built on 
top of an object-relational database system ^7]. Fi- 
nally, Zhang et al. propose an approach, named X- 
Warehouse, to materialize data warehouses based 
on frequent query patterns represented by Frequent 
Pattern Trees [58] . 

2.1.4 XOLAP 

Some recent research attempts to perform OLAP 
analysis over XML data. In order to support OLAP 
queries and to be able to construct complex ana- 
lytic queries, some researches extend the XQuery 
language with aggregation features [5] . 
Wiwatwattana et al. also introduce an XQuery 
cube operator, X*3 [5S], Hachicha et al. also pro- 
pose a similar operator, but based on TAX (Tree 
Algebra for XML) [17]. 

2.1.5 Cloud computing 

Another, increasingly popular web-based solution 
is cloud computing. Cloud computing provides ac- 
cess to large amounts of data and computational 
resources through a variety of interfaces [38]. It is 
provided as services via cloud (Internet). These ser- 
vices delivered through data centers are accessible 
anywhere. Besides, they allow the rise of cloud an- 
alytics [2 . 

The main consumers of cloud computing are small 
enterprises and startups that do not have a legacy of 
IT investments to manage [50] . Cloud computing- 
based BI tools are rather cheap for small and middle- 
sized enterprises, because they provide no need of 
hardware and software maintenance yij and their 



prices increase according to required data storages. 
Contrariwise, cloud computing does not allow users 
to physically possess their data storage. It causes 
user dependence on the cloud computing provider, 
loss of data control and data security. In conclu- 
sion, most cloud computing-based BI tools do not 
fit enterprise requirements yet. 

2.1.6 Discussion 

Data storage and analysis interface solutions should 
be easily deployed in a small organization at low 
cost, and thus be based on web technologies such 
as XML and web services. Web warehousing is 
rather recent, but a popular direction that provides 
a lot of advantages, especially in data integration. 
Web-based tools provide light interface. Thereby, 
their usage by small and middle-sized enterprises is 
limited. Existing cloud-based BI tools are appro- 
priated for small and middle-sized enterprises with 
respect to price and flexibility. However, they are 
so far enterprise-friendly and are in need of data 
security enhancements. 

2.2 Web-based open source software 

In this section, we focus on ETL (Extraction Trans- 
formation Loading) tools, OLAP servers and OLAP 
clients. Their characteristics are summarized in Ta- 
ble 1. 

2.2.1 ETL 

Web-based free ETL tools are in most cases RO- 
LAP (Relational OLAP, discussed in Section 3.1.1.)- 
oriented. ROLAP-oriented ETL tools allow user 
to define and create data transformations in Java 
(JasperETL) or in TL (Clover.ETL)Q. Singular MO- 
LAP (Multidimensional OLAP, discussed in Section 
3. 1.1.) -oriented ETL Palo defines the ETL process 
either via web interfaces or via XML structures for 
experts. All studied ETL tools configure heteroge- 
neous data sources and complex file formats. They 
interact with differents DBMSs (DataBase Manage- 
ment Systems). Some of the tools can also ex- 
tract data from ERP (Enterprise Resource Plan- 
ning) and CRM (Customer Relationship Manage- 
ment) systems [53] . 

2.2.2 OLAP 

In this section we review OLAP servers as well 
as OLAP clients. All studied OLAP severs use 
the MDX (Multi-Dimensional expression) language 
for aggregating tables. They parse MDX into SQL 
to retrieve answers to dimensional queries. All re- 
viewed OLAP servers exists for Java, but a Palo 

^http://www.cloveretl.com 



exists also for .NET, PHP, and C. Moreover, Palo 
is an in-memory Multidimensional OLAP database 
servei0. Mondrian schemas are represented in XML 
filefi. Mondrian Pentaho Sever is used by different 
OLAP clients, e.g., FreeAnalysis. 
All studied OLAP clients are Java applications. They 
usually run on client, but tools also exist that run 
on web servers[53j. So far, only PocOLAP is a 
lightweight, open source OLAP solution. 

2.2.3 Discussion 

The industrial use of open source business intel- 
ligence tools is becoming increasingly common, but 
it is still not as wide- spread as for other types of 
software [S3j. Moreover, freeware OLAP systems 
often propose simple web-based interfaces. In addi- 
tion, there are some web-based open source BI tools 
that work in memory. 

Nowadays, there are three complete solutions, in- 
cluding ETL and OLAP: Talend OpenStudio, Mon- 
drian Pentaho and Pa- lo. 

Among ETL tools, only Palo is MOLAP-oriented. 
Not all of these tools provide free graphical user in- 
terfaces. All three represented ETL tools support 
Java. They can be implemented on different plat- 
forms. 

Free web-based OLAP servers are used by different 
OLAP clients. The most extended and widely used 
is Mondrian Pentaho Server due to its function- 
ality. All studied OLAP clients are Java applica- 
tions. Most of them can be used with XMLA(XML 
for Analysis)-enabled sources. But they have not 
enough documentation. 

Generally, web-based studied tools provide sufficient 
functionality, but they remain cumbersome due to 
traditional OLAP usage. 

3. IN-MEMORY BI SOLUTIONS 

In the late eighties, main memory databases were 
researched by numerous authors. Thereafter, it has 
rarely been discussed because of limits of technolo- 
gies at this time, but nowadays it takes back an 
important place in database technologies. 

3.1 MOLAP 

3.1.1 OLAP and MOLAP 

Before studying existing MOLAP approaches, we 
review general OLAP principles and definitions. The 
OLAP concept was introduced in 1993 by Codd. 
OLAP is an approach to quickly answer multidi- 
mensional analytical queries [IS . In OLAP, a di- 

^http://www .jedox.com/en/products/palo_olap_server 
^http://mondrian. pentaho. org/ 







Tools 


Platform 


License 


Particular features 






Clover.ETL 


Java 


LGPL 


does not have an open source GUI; uses 


ETL 








its own TL language for data transfor- 
mations 






JasperETL 


Java 


GPL 


generated code - Java or Perl; can use 
GRM systems as data sources 




MOLAP 


Palo ETL 
Server 


Java 


GPL 


does not have a GUI for a while; parallel 
jobs are not supported 






Mondrian 


Java 


GPL 


ROLAP-based; data cubes via XML 




Palo 


Java 


GPL 


MOLAP-based; works in memory; data 


OLAP 










cubes via Excel add-in 






Free Analysis 


Java 


MPL 


works with servers that use XML A, 




clients 








e.g., Modrian 






JPalo 


Java 


GPL 


works with the Palo server 






PocOLAP 


Java 


LGPL 





Table 1: Web-based open source software 



mension is a sequence of analyzed parameter val- 
ues. An important goal of multidimensional mod- 
eling is to use dimensions to provide as much con- 
text as possible for facts [H]. Combinations of di- 
mension values define a cube's cell. A cube stores 
the result of different calculations and aggregations. 
There are three variants of OLAP: MOLAP, RO- 
LAP, Hybrid OLAP (HOLAP). We compare these 
approaches in table 2. 

With respect to ROLAP and HOLAP, MOLAP pro- 
vides faster computation time and querying |48| due 
to a storage of all required data in the OLAP server. 
Moreover, it provides more space-efficient storage 

m- 

Since the purpose of MOLAP is to support deci- 
sion making and management, data cubes must con- 
tain sufficient information to support decision mak- 
ing and reply to every user expectation. In this 
context, researches try to improve three main as- 
pects: response time (by new aggregations algo- 
rithms [28] ■ new operators [46]), query personaliza- 
tion, data analysis visualization |26) . 

3.1.2 Storage methods 

Researchers interested in MOLAP focus a lot on 
storage techniques. In addition, most researches 
choose MOLAP as the most suitable among OLAP- 
techniques for storage [31], although MOLAP re- 
quires significant storage capacity. According to 
Kudryavcev, there are three basic types of storage 
methods: semantic, syntactical, approximate [23]. 
Syntactical approaches transform only data stor- 
age schemas. Semantic storage techniques trans- 
form cube structures. Approximate storage tech- 
niques compress initial data. One semantic storage 
technique is Quotient Cube. It consists in a se- 



mantic compression by partitioning the set of cells 
of a cube into equivalent classes, while keeping the 
cube's roll-up and drill-down semantics and lattice 
structure [25] . The main objective of such approxi- 
mating storage technique such as Wavelets is range- 
sum query optimization [29, . In the syntactical ap- 
proach DWARF, a cube is compressed by deleting 
redundant information [49]. Data are represented 
as graphs with keys and pointers in graphs nodes. 
Data redundancy decrease is provided by an ad- 
dressing and data storage improvement. 

3.1.3 Schema evolution 

There are a lot of works that bring up the prob- 
lem of schema evolution, because working only with 
the latest version hides the existence of information 
that may be critical for data analysis. It is possible 
to classify these studies into two groups: updat- 
ing models (mapping data in the last version) and 
tracking history models (saving schema evolution). 
Other types of approaches look at the possibility 
for users to choose which presentation they want for 
query reponses. For instance. Body et al. proposed 
a novel temporal multidimensional model for sup- 
porting evolutions on multidimensional structures 
by introducing a set of temporal modes of presen- 
tation for dimensions in a star schema 

3.1.4 Discussion 

Multidimensional OLAP is appropriate for de- 
cision making. It offers a number of advantages, 
including automatic aggregation, visual querying, 
and good query performance due to the use of pre- 
aggregation [39]. Besides. MOLAP may be a good 
solution for the situations in which small to medium- 
sized DBs are the norm and application software 



Table 2: Comparison of OLAP technologies 





A /TOT A t> 


txULAr 


HOLAP 


Data storage 


Multidimensional 
database 


Relational database 


Uses MOLAP technology to store 
higher-level summary data, a RO- 
LAP system to store detailed data 


Results sets 


Stores in a MOLAP 
cube 


Stores no results sets 


Stores results sets, but not all 


Capacity 


Requires singificant ca- 
pacity 


Requires the least stor- 
age capacity 


Compromise between performance, 
capacity, and permutations of 
dimensions available to a user 


rertormance 


1 he fastest perfor- 
mance 


'T'l 1-1- r 

1 he slowest perfor- 
mance 


Dimensions 


Minimum number 


Maximum number 


Vulnerability 


Provides poor storage 
utilization, especially 
when the data set is 
sparse 


Database design rec- 
ommended by ER di- 
agrams are inappropri- 
ate for decision sup- 
port systems 




Advantages 


Fast query perfor- 
mance; automated 
computation of higher 
level aggregates of the 
data; array model pro- 
vides natural indexing 


No limitation on data 
volume; leverage func- 
tionalities inherent in 
relational databases 


Fast access at all levels of aggrega- 
tion; compact aggregate storage; dy- 
namically updated dimensions; easy 
aggregate maintenance 


Disadvantages 


Data redundancy; 
querying models with 
dimensions of high 
cardinality is difficult 


Slow performance 


Complexity - a HOLAP server must 
support MOLAP and ROLAP en- 
gines, tools to combine storage en- 
gines and operations. Functionality 
overlap - between storage and opti- 
mization techniques in ROLAP and 
MOLAP engines. 



speed is critical [IS], because loading all data to 
the multidimensional format does not require sig- 
nificant time nor disk space. Nevertheless, MOLAP 
systems have different problems due to the complex- 
ity, time-consuming and necessity of an expert for 
cube rebuilding. If the user wants to change di- 
mensions, the whole deployment process need to be 
redone (datamart schema, ETL process, etc.) [56] . 
However, the cost of MOLAP tools does not fit the 
needs of small and middle-sized enterprises. In ad- 
dition, MOLAP-based systems may encounter sig- 
nificant scalability problems. Moreover, MOLAP 
requires a cumbersome architecture, i.e., important 
software and hardware needs, the necessity of signif- 
icant changes in work process to generate substan- 
tial benefits [32], and a considerable deployment 
time. 

3.2 Main Memory Databases 

3.2.1 General information 
Main Memory Databases (MMDBs) entirely re- 



side in main memory |14| and only use a disk sub- 
system for backup [TQ . The concept of managing 
an entire database in main memory has been re- 
searched for over twenty years, and the benefits of 
such approaches have been well-understood in cer- 
tain domains, such as telecommunications, security 
trading, applications handling high traffic of data, 
e.g., routers; real-time applications. However, it is 
only recently, with decreasing memory prices and 
the availability of 64-bit operating systems, that the 
size restrictions on in-memory databases have been 
removed and in-memory data management has be- 
come available for many applications [27l[54]. When 
the assumption of disk-residency is removed, com- 
plexity is dramatically reduced. The number of ma- 
chine instructions drops, buffer pool management 
disappears, extra data copies are not needed, in- 
dex pages shrink, and their structure is simplified. 
Design becomes simpler and more compact, and 
queries are executed faster |54| . Consequently, us- 
age of main memory databases become advanta- 
geous in many cases: for hot data (frequently ac- 



cess, low data volume), for cold data (scarce access, 
in the case of voluminous data), in application re- 
quiring a short access/response time. 
A second wave of applications using MMDB is cur- 
rently appearing, e.g., FastDB, Dali from AT&T 
Bell lab, TimesTen from Oracle. These systems 
are widely used in many applications such as HP 
intellect web flat already, Cisco VoIP call Proxy, 
the telecom system of Alcatel and Ericsson and so 
on [H]. The high demand of MMDBs is provoked 
by the necessity of high reliability, high real-time 
capacity, high quantity of information throughput 

m- 

MMDBs have some advantages, including short re- 
sponse time, good transaction throughputs. MMDBs 
also leverage the decreasing cost of main memory. 
Contrariwise, MMDB size is limited by size of RAM 
(Random Access Memory). Moreover, since data in 
main memory can be directly accessed by the pro- 
cessor, MMDBs suffer from data vulnerability, i.e., 
risk of data loss because of unintended accident due 
to software errors [T^, hardware failure or other haz- 
ards. 

3.2.2 MMDB issues 

Although in-memory technologies provide high 
performance, scalability and flexibility to BI tools, 
they are still some open issues. MMDBs work in 
memory, therefore the main problems and challenges 
are recovery, commit processing, access methods 
and storage. 

There is no doubt that backups of memory resident 
databa- ses must be maintained on other storage 
than main memory in order to insure data integrity. 
In order to protect against failures, it is necessary 
to have a backup copy and to keep a log of transac- 
tion activity |14j . In addition, recovery processing is 
usually the only MMDB component that deals with 
disk I/O, so it must be designed carefully [20]. Ex- 
isting research works do not share a common view 
of this problem. Some authors propose to use a 
part of stable main memory to hold the log. It pro- 
vides short response time, but it causes a problem 
when logs are large. So, it is used for the precom- 
mit transactions. Group commits (e.g., a casual 
commit protocol [27]) allow accumulating several 
transactions in memory before flushing them to the 
log disk. Nowadays, commit processing is especially 
important in distributed database systems because 
it is slow due to the fact that disk logging takes 
place at several sites [27] . 

Several different approaches of data storage exist for 
MMDBs. Initially, there have been a lot of attempts 
to use database partitioning techniques developed 



earlier for other types of databases. Gruenwald and 
Eich divide existing techniques as following: hori- 
zontal partitioning, group partitioning, single ver- 
tical partitioning, group vertical partitioning, and 
mixed partitioning I16j . Only horizontal and single 
vertical partitioning are suitable for MMDBs and, 
as a result of this study, single vertical partitioning 
was chosen as the most efficient [10]. B-trees and 
hashing are identified also as appropriate storage 
techniques for MMDBs. Hashing is not as space ef- 
ficient as a tree, so it is rarely used [43]. Finally, 
most researches agree to choose T-trees (a balanced 
index tree data structure optimized for cases where 
both the index and the actual data are fully kept 
in memory) as the main storage technique [HJ [T3J 
144] . T-trees indeed require less memory space and 
fewer CPU cycles than B-trees, so indexes are more 
economical. 

Above-mentioned issues are important for BI envi- 
ronment: data coherence is strategic, performance 
is fundamental for on-line operations like OLAP. 
Choices of right storage and recovering techniques 
are crucial as it can damage data security and data 
integrity. 

3.2.3 MMDB Systems 

In this section we give an overview of MMDB sys- 
tems. We particularly focus our discussion on the 
most recent systems such as Dali, FastDB, Kdb, 
IBM Cognos TMl and TimesTen. 
Among studied systems, we can distinguish a stor- 
age manager (the Dali system [2^) and complete 
main memory data- base systems (FastDB, Kdb, 
TMl, TimesTen). Interfaces can be based on zero- 
footprint Web (IBM Cognos TMlQ), standard SQL 
(TimesTen) [51] or C-|-t- (FastDB) 22 . Most MMDBS 
feature SQL or SQL-like query language (FastDB, 
TimesTen). Kdb system uses its own language "q" 
for programming and querying [24] . 
IBM Cognos 8 BI and TimesTen are aimed at de- 
cision making in large corporations. Main MMDB 
disadvantages are interprocess communication ab- 
sence and high storage requirements (Dali system) 
[9|, limitation of server memory (TimesTen), client- 
server architecture is unsupportable (FastDB). 

3.2.4 Discussion 

The main benefit of using MMDBs is short ac- 
cess/reponse time and good transaction through- 
put. But MMDBs are hampered by data vulnera- 
bility and security problems. Memory is not persis- 
tent, which means data loss in case of failure on the 
server. Security problems come from unauthorized 

* www-01.ibm.com/software/data/cognos/products/tml 



access to data aimed at data corruption or theft. 
So far, MMDBs are mainly used in real-time ap- 
plications, telecommunications, but not commonly 
used for decision making. 

In spite of a considerable research on MMDBs, there 
are some unresolved issues such as data security and 
safety and data processing efficiency. 

3.3 Vector Databases 

3.3.1 General information 

A vector table is built by transforming a file in 
the following way: every record represents all col- 
umn values in a vector. Vector databases (VDBs) 
do not require indexes nor any complex database 
structure. Differences between vector and relational 
databases are summarized in table 2. In order to ac- 
cess data, relational DBMSs provide only sequential 
scan by columns and by rows. VDBs provide fast 
data access. Besides, relational DBs store large vol- 
umes of repeatable data due to data nature. For 
example, in a table of students, French nationality 
can be repeated in a great number. Contrariwise, 
in VDBs, this data is present only once. It provides 
significant data compression. 

The main principles of vector databases are data 
associations and data access by pointers. Vector 
database implementations allow elimination of data 
redundancy, because any possible pice of data is 
written once and it does not repeat itself. Such 
metadata as keys in the relational data model loose 
their interest in VDBs, because data associations 
are provided by pointers. Hence, VDBs do not con- 
sume as much space as relational DBs. 

3.3.2 VDB -based BI 

The main principle of vector database is that in- 
stead of dimension associations with OLAP cube 
there are associations between data. These associa- 
tions are defined during data load process by match- 
ing up table columns having the same name. Usage 
of vector databases differs from classical warehous- 
ing: there is no predefinition of what a dimension 
is. Any piece of data is available as dimension and 
any piece of data is available as measure. So, it is 
not necessary to reconstruct data schema in the case 
of dimension change. As vector databases work in 
memory, VDB-based BI are endowed with instant 
data access. However, entreprises frequently hesi- 
tate to use VDB-based BI because of noninteroper- 
ability with SQL tools. 

One BI tool that uses vector database deployment 
is QlikVie'«|3. QlikView provides integrated ETL. It 

^ www . qlikvie w .com 



removes the need to pre-aggregate data. It is pos- 
sible to change analysis axes any moment at any 
level of query detailing. Despite QlikView capaci- 
ties, it has some limitations and disadvantages such 
as lack of a unified metadata view and of predicting 
models (Qlik View's statistical analysis features are 
less developed than the in other BI tools). There 
is no specialization in visualization: QlikView pro- 
vides a clean interface to analysts but it lacks ad- 
vanced visualization features to help them graphi- 
cally wade through complicated data. One of the 
QlikViews's features is an ability to automatically 
connect tables. But this can create some problems. 
When there are fields, which represent the same 
thing in different tables and they do not have the 
same name, it is necessary to rename them to con- 
nect them. When there are fields in different tables 
that have the same name, but not the same content 
and sense, a senseless connection is created. So it 
is necessary to delete this connection and reana- 
lyze all the fields with the same names in order to 
distinguish the ones with different sense. QlikView 
provides a possibility for end-users to use integrated 
ETL and to construct their data schema themselves, 
which often leads to unsatisfactory results. 

3.3.3 Discussion 

Vector databases hold the same advantages as 
others in-memory databases and are only limited 
by memory size. 

VDB-based BI is a relatively new direction, but it is 
rather popular due to fast performance, great analy- 
sis capacity, unlimited number of dimensions, tables 
and measures and implementation easiness. 
However, among features proposed by QlikView, 
there are disputable ones: automatic table connec- 
tion, possibility to create a data schema by end- 
user. These characteristics do not cover different 
situations due to data aggregation complexity when 
data come from different sources. Such data have 
different refinement levels, different field names, etc. 
Consequently, providing to end-users the possibility 
to create data schemas can provoke an inadequate 
data schema, table connections, data loss as well as 
false data presence in database. Moreover, VDB- 
based BI tools are often blackboxes, meaning that 
we do not know what happens inside. Such models 
also lack fiexibility. 

4. CONCLUSION 

Nowadays, BI becomes an essential part of any 
enterprise, even an SME. This necessity is caused by 
the increasing data volume indispensable for deci- 
sion making. Existing solutions and tools are mostly 



Table 3: Relational and vector database chatacteristics 



Characteristic 


Relational DB 


Vector DB 


Access to data 


Sequential 


Parallel 


Data integrity 


Foreign Keys 


Multi-dimensional 


Data relations stored in 


Keys 


Vectors 


Data reuse 


Not available 


Built-in 


Metadata 


System tables 


None 


Speed (high volume) 


Slow 


Fast 


Uniqueness 


User Constraints 


Built-in 



aimed at large-scaled enterprises; thereby they are 
inaccessible or insufficient for SMEs because of high 
price, redundant functionality, complexity, and high 
hardware and software requirements. SMEs require 
solutions with light architectures that, moreover, 
are cheap and do not require additional hardware 
and software. 

This survey discusses the importance of data ware- 
housing for SMEs, presents the main characteristics 
and examples of web-based data warehousing, MO- 
LAP systems and MMDBs. All these approaches 
have important disadvantages to be chosen as a 
unique decision support system: cumbersome ar- 
chitecture and complexity in MOLAP, data vulner- 
ability in MMDBs, non-transparency and providing 
too large powers for users in VDB-based systems, 
security issues in cloud computing systems. 
In this context, our research objective is to design 
BI solutions that are suitable for SMEs and avoid 
the aforementioned disadvantages. 
Our idea is to work toward a ROLAP system that 
operates in- memory, i.e., to add in OLAP opera- 
tors on top of an SQL-based MMDB. This should 
simplify a lot the in-memory OLAP architecture 
with respect to MOLAP. Choosing an open source 
MMDB system (such as FastDB) and using well- 
known ETL, modeling and analysis processes should 
also help avoid the "black box issue" of VDBs. Fi- 
nally, storing business data as close to the user as 
possible mitigates security issues with respect to 
cloud BL Problems will still remain, though (e.g., 
data vulnerability and need for backup, the design 
of adapted, in-memory indexes for OLAP), but we 
are confident we can address them in our future re- 
search. 
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